Project 5b: Diffusion Models from Scratch!

Implementing and training diffusion models step by step

Part 1: Training a Single-Step Denoising UNet

Following the method introduced in the paper, I implemented a single-step denoising UNet. I trained the UNet on the MNIST dataset. The training loss curve is shown below:

Training Loss Curve

The visualization of the different noising processes over α = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0:

Noising Process Visualization

Then I visualized denoised results on the test set at the end of training. Displaying sample results after the 1st and 5th epoch. The results are shown below:

Denoised Results After 1st Epoch Denoised Results After 5th Epoch
Denoised Results

Visualization of the denoiser results on test set digits with varying levels of noise over α = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0. The result is shown below:

Denoiser Results with Varying Alpha

Part 2: Training a Diffusion Model

2.1 Adding Time Conditioning to UNet

Time conditioning involves encoding the timestep t into a latent representation and injecting it into the UNet to inform the model about the stage of the denoising process, as introduced in the paper.

2.2 Training the Time-Conditioned UNet

I trained the time-conditioned UNet on the MNIST dataset. The training loss curve is shown below:

Time-Conditioned Training Loss Curve

2.3 Sampling from the Model

Sampling from the model trained above. I sampled after 5 epochs and 20 epochs:

Samples After 5 Epochs Samples After 20 Epochs

2.4 Class Conditioning

Class conditioning allows the UNet to generate images conditioned on a specific class (e.g., digits 0–9 for MNIST). This is achieved by injecting a one-hot encoded class vector into the network. The training loss curve is shown below:

Class Conditioning Training Loss Curve

2.5 Sampling from the Class-Conditioned Model

Sampling from the class-conditioned model trained above. I also sampled after 5 epochs and 20 epochs:

Class-Conditioned Samples After 20 Epochs Class-Conditioned Samples After 5 Epochs